Menu Top
Latest Economics NCERT Notes, Solutions and Extra Q & A (Class 9th to 12th)
9th 10th 11th 12th

Class 11th Chapters
Indian Economic Development
1. The Indian Economy On The Eve Of Independence 2. Indian Economy 1950-1990 3. Liberalisation, Privatisation And Globalisation: An Appraisal
4. Human Capital Formation In India 5. Rural Development 6. Employment: Growth, Informalisation And Other Issues
7. Environment And Sustainable Development 8. Comparative Development Experiences Of India And Its Neighbours
Statistics For Economics
1. Introduction 2. Collection Of Data 3. Organisation Of Data
4. Presentation Of Data 5. Measures Of Central Tendency 6. Correlation
7. Index Numbers 8. Use Of Statistical Tools



Chapter 6 Correlation



In previous chapters, you have learned how to summarize large datasets and describe changes in variables. This chapter introduces the concept of **correlation**, which allows us to examine the relationship or association between two different variables. Understanding correlation helps us determine if the value of one variable tends to change when the value of another variable changes, whether they move in the same or opposite directions, and the strength of this relationship.

Introduction

Having covered data summarization and describing changes in single variables, this chapter focuses on exploring the connection between two distinct variables. The goal is to understand how two variables might influence or move together.




Types Of Relationship

Relationships between variables can take various forms. Some might suggest a cause-and-effect link (causation), like the relationship between the price of a commodity and the quantity demanded. Lower prices often lead to higher demand, while higher prices lead to lower demand. Similarly, low rainfall can be related to low agricultural productivity.

However, correlation does not necessarily imply causation. Some relationships might be coincidental, like the arrival of migratory birds and birth rates in a locality. The relationship between shoe size and the amount of money in your pocket is another such example of a lack of logical connection.

In other cases, a third variable might be influencing the relationship between two variables. For instance, a high number of ice cream sales might coincide with a higher number of deaths due to drowning. This isn't because eating ice cream causes drowning. Instead, a third factor, like rising temperature, leads both to increased ice cream sales and more people going swimming, potentially increasing drowning incidents. Thus, temperature is the underlying cause of the observed relationship between ice cream sales and drowning deaths.


What Does Correlation Measure?

Correlation is a statistical measure that studies and quantifies the **direction** and **intensity** of the relationship between variables. It specifically measures **covariation**, meaning how two variables tend to move together, but not necessarily a cause-and-effect relationship (causation).

If a correlation exists between two variables, say X and Y, it means that when the value of X changes in a certain direction, the value of Y tends to change in a predictable way – either in the same direction (positive correlation) or the opposite direction (negative correlation).

For simplicity, correlation analysis often focuses on **linear relationships**, where the movement between the two variables can be approximated by a straight line when plotted on a graph.


Types Of Correlation

Correlation is commonly classified into two main types:




Techniques For Measuring Correlation

Several tools are used to study and measure correlation:


Scatter Diagram

A scatter diagram is a simple but useful graphical technique for visualizing the form of the relationship between two variables. Pairs of values for the two variables are plotted as points on a graph. The pattern and closeness of these points give a visual impression of the nature and strength of the relationship.

Scatter diagram showing a positive correlation.
Scatter diagram showing a negative correlation.
Scatter diagram showing no correlation.
Scatter diagram showing perfect positive correlation.
Scatter diagram showing perfect negative correlation.

The degree of closeness of the points to a line indicates the strength of the correlation: closer points suggest stronger correlation, dispersed points suggest weaker correlation. If the points follow a straight line, the relationship is linear. If they follow a curved pattern (Fig. 6.6, Fig. 6.7), the relationship is non-linear.

Scatter diagram showing a positive non-linear relation.
Scatter diagram showing a negative non-linear relation.

Karl Pearson’s Coefficient Of Correlation

This is a numerical measure that provides a precise value for the degree of **linear** relationship between two quantitative variables, X and Y. It is also known as the product moment correlation coefficient and is denoted by $r$.

The formula for Karl Pearson's coefficient of correlation is:

$r = \frac{\text{Cov}(X,Y)}{\sigma_x \sigma_y}$

Where Cov(X,Y) is the covariance between X and Y, and $\sigma_x$ and $\sigma_y$ are the standard deviations of X and Y, respectively.

Covariance is given by $\text{Cov}(X,Y) = \frac{\sum (X - \bar{X})(Y - \bar{Y})}{N}$, where $\bar{X}$ and $\bar{Y}$ are the means, and N is the number of observations.

Alternative formulas for $r$ based on raw data or deviations:

$r = \frac{\sum (X - \bar{X})(Y - \bar{Y})}{\sqrt{\sum (X - \bar{X})^2} \sqrt{\sum (Y - \bar{Y})^2}}$

$r = \frac{N \sum XY - (\sum X)(\sum Y)}{\sqrt{N \sum X^2 - (\sum X)^2} \sqrt{N \sum Y^2 - (\sum Y)^2}}$

It is essential to use Pearson's $r$ only when a scatter diagram suggests a linear relationship. Calculating $r$ for a non-linear relationship can be misleading.


Properties Of Correlation Coefficient

The correlation coefficient $r$ has several important properties:

Correlation measures covariation, not causation. A positive correlation between deaths and doctors during an epidemic might occur if doctors are sent to severely affected areas (influenced by a third variable like severity), not because doctors cause deaths.

Example 1. Calculate the correlation coefficient between No. of years of schooling of farmers (X) and Annual yield per acre in ’000 (Rs) (Y).

X: 0, 2, 4, 6, 8, 10, 12

Y: 4, 4, 6, 10, 10, 8, 7

Answer:

Years of Education (X) (X– $\bar{X}$ ) (X– $\bar{X}$ )2 Annual yield (Y) (Y– $\bar{Y}$ ) (Y– $\bar{Y}$ )2 (X– $\bar{X}$ )(Y– $\bar{Y}$ )
0 –6 36 4 –3 9 18
2 –4 16 4 –3 9 12
4 –2 4 6 –1 1 2
6 0 0 10 3 9 0
8 2 4 10 3 9 6
10 4 16 8 1 1 4
12 6 36 7 0 0 0
$\sum X=42$ $\sum (X– \bar{X})=0$ $\sum (X– \bar{X})^2=112$ $\sum Y=49$ $\sum (Y– \bar{Y})=0$ $\sum (Y– \bar{Y})^2=38$ $\sum (X– \bar{X})(Y– \bar{Y})=42$

N = 7.

$\bar{X} = \frac{42}{7} = 6$, $\bar{Y} = \frac{49}{7} = 7$.

$\sigma_x = \sqrt{\frac{\sum (X– \bar{X})^2}{N}} = \sqrt{\frac{112}{7}} = \sqrt{16} = 4$.

$\sigma_y = \sqrt{\frac{\sum (Y– \bar{Y})^2}{N}} = \sqrt{\frac{38}{7}} \approx \sqrt{5.428} \approx 2.33$

Cov(X,Y) = $\frac{\sum (X– \bar{X})(Y– \bar{Y})}{N} = \frac{42}{7} = 6$.

$r = \frac{\text{Cov}(X,Y)}{\sigma_x \sigma_y} = \frac{6}{4 \times 2.33} = \frac{6}{9.32} \approx 0.644$

There is a positive correlation of approximately 0.644 between years of schooling and yield per acre.


Step Deviation Method To Calculate Correlation Coefficient.

This method simplifies calculations for large values by transforming the variables X and Y using assumed means and common factors.

Let $U = \frac{X - A}{h}$ and $V = \frac{Y - B}{k}$, where A and B are assumed means, and h and k are common factors. The correlation coefficient between U and V ($r_{UV}$) is equal to the correlation coefficient between X and Y ($r_{XY}$).

$r_{XY} = r_{UV} = \frac{N \sum UV - (\sum U)(\sum V)}{\sqrt{N \sum U^2 - (\sum U)^2} \sqrt{N \sum V^2 - (\sum V)^2}}$

Example 2. Calculate the correlation coefficient between Price index (X) and Money supply in Rs crores (Y).

X: 120, 150, 190, 220, 230

Y: 1800, 2000, 2500, 2700, 3000

Answer:

Let A = 100, h = 10. Let B = 1700, k = 100.

$U = \frac{X - 100}{10}$, $V = \frac{Y - 1700}{100}$.

X Y U V U2 V2 UV
120 1800 2 1 4 1 2
150 2000 5 3 25 9 15
190 2500 9 8 81 64 72
220 2700 12 10 144 100 120
230 3000 13 13 169 169 169
$\sum U=41$ $\sum V=35$ $\sum U^2=423$ $\sum V^2=343$ $\sum UV=378$

N = 5.

$r = \frac{N \sum UV - (\sum U)(\sum V)}{\sqrt{N \sum U^2 - (\sum U)^2} \sqrt{N \sum V^2 - (\sum V)^2}} = \frac{5 \times 378 - (41)(35)}{\sqrt{5 \times 423 - (41)^2} \sqrt{5 \times 343 - (35)^2}}$

$r = \frac{1890 - 1435}{\sqrt{2115 - 1681} \sqrt{1715 - 1225}} = \frac{455}{\sqrt{434} \sqrt{490}} = \frac{455}{\sqrt{212660}} \approx \frac{455}{461.6} \approx 0.985$

There is a strong positive correlation (approximately 0.985) between price index and money supply.


Spearman’s Rank Correlation

Developed by C.E. Spearman, this method measures the linear association between the **ranks** assigned to individual items or observations, rather than their actual values. It is particularly useful in situations where precise numerical measurement is difficult or impossible, such as measuring subjective attributes like honesty, beauty, or intelligence.

It can also be used when data has extreme values (as it's not affected by them) or when the relationship is non-linear but its direction is clear. The formula uses ranks (R):

$r_s = 1 - \frac{6 \sum D^2}{n(n^2 - 1)}$

Where $n$ is the number of observations/pairs and $D$ is the difference between the ranks assigned to the same item for the two variables ($D = R_x - R_y$).

The properties of Pearson's $r$ generally apply to Spearman's $r_s$ as well; it ranges from -1 to +1 and has no unit. However, $r_s$ is generally less accurate than Pearson's $r$ when precise quantitative data is available because it uses only rank information, not the actual magnitude of differences between values.


Calculation Of Rank Correlation Coefficient

Calculation of rank correlation depends on whether ranks are already provided, need to be assigned, or if there are ties (repeated ranks).


Case 1: When The Ranks Are Given

If ranks for both variables are given, calculate the difference in ranks (D) for each pair, square the differences ($D^2$), sum the squares ($\sum D^2$), and apply the formula $r_s = 1 - \frac{6 \sum D^2}{n(n^2 - 1)}$.

Example 3. Five persons are assessed by three judges (A, B, C) in a beauty contest. Ranks given by each judge are provided. Find which pair of judges has the nearest approach to common perception of beauty.

Competitors (Rank by Judge A): 1, 2, 3, 4, 5

Competitors (Rank by Judge B): 2, 4, 1, 5, 3

Competitors (Rank by Judge C): 1, 3, 5, 2, 4

Answer:

Calculate rank correlation for each pair of judges (A vs B, A vs C, B vs C).

Judge A vs Judge B:

Competitor Rank A (RA) Rank B (RB) D = RA – RB D2
1 1 2 –1 1
2 2 4 –2 4
3 3 1 2 4
4 4 5 –1 1
5 5 3 2 4
$\sum D=0$ $\sum D^2=14$

$n=5$. $r_{AB} = 1 - \frac{6 \times 14}{5(5^2 - 1)} = 1 - \frac{84}{5 \times 24} = 1 - \frac{84}{120} = 1 - 0.7 = 0.3$.

Judge A vs Judge C:

Competitor Rank A (RA) Rank C (RC) D = RA – RC D2
1 1 1 0 0
2 2 3 –1 1
3 3 5 –2 4
4 4 2 2 4
5 5 4 1 1
$\sum D=0$ $\sum D^2=10$

$n=5$. $r_{AC} = 1 - \frac{6 \times 10}{5(5^2 - 1)} = 1 - \frac{60}{120} = 1 - 0.5 = 0.5$.

Judge B vs Judge C:

Competitor Rank B (RB) Rank C (RC) D = RB – RC D2
1 2 1 1 1
2 4 3 1 1
3 1 5 –4 16
4 5 2 3 9
5 3 4 –1 1
$\sum D=0$ $\sum D^2=28$

$n=5$. $r_{BC} = 1 - \frac{6 \times 28}{5(5^2 - 1)} = 1 - \frac{168}{120} = 1 - 1.4 = -0.4$.

Comparing the correlation coefficients: $r_{AB}=0.3$, $r_{AC}=0.5$, $r_{BC}=-0.4$. The highest absolute value is $|-0.4| = 0.4$ (which is actually the weakest positive or strongest negative relationship here). The coefficient closest to +1 (or -1) in absolute value indicates stronger correlation. $r_{AC}=0.5$ is the highest positive value, $r_{BC}=-0.4$ is the highest negative value in magnitude (0.4), $r_{AB}=0.3$ is the weakest positive. The question asks for the nearest approach to common perception, which means the highest positive correlation. This is $r_{AC}=0.5$. However, the text states B and C have very different tastes ($r_{BC}=-0.4$) and A and C are closest ($r_{AC}=0.5$), while A and B correlation is 0.3. This suggests comparing the values directly. $0.5 > 0.3 > -0.4$. The closest positive value to 1 indicates highest similarity in perception, so Judge A and C have the closest perception (0.5).


Case 2: When The Ranks Are Not Given

Assign ranks to the values of each variable independently. The highest value gets rank 1, the next highest rank 2, and so on. If two values are equal, they are given the average of the ranks they would have occupied. Once ranks are assigned (Rx and Ry), calculate D = Rx - Ry, $D^2$, $\sum D^2$, and use the formula $r_s = 1 - \frac{6 \sum D^2}{n(n^2 - 1)}$.

Example 4. Calculate the rank correlation coefficient between marks secured by 5 students in Economics (Y) and Statistics (X): Student A(85, 60), B(60, 48), C(55, 49), D(65, 50), E(75, 55). (Marks in X, Marks in Y)

Answer:

Assign ranks:

Student Statistics (X) Economics (Y) Rank X (RX) Rank Y (RY) D = RX – RY D2
A 85 60 1 1 0 0
B 60 48 4 5 –1 1
C 55 49 5 4 1 1
D 65 50 3 3 0 0
E 75 55 2 2 0 0
$\sum D=0$ $\sum D^2=2$

$n=5$. $r_s = 1 - \frac{6 \sum D^2}{n(n^2 - 1)} = 1 - \frac{6 \times 2}{5(5^2 - 1)} = 1 - \frac{12}{120} = 1 - 0.1 = 0.9$.

There is a strong positive rank correlation (0.9).


Case 3: When The Ranks Are Repeated

When ranks are repeated (ties), assign the average rank to the tied values. A correction factor is added to the formula $\sum D^2$. For each group of tied ranks of size m, add $\frac{m(m^2 - 1)}{12}$ to $\sum D^2$.

$r_s = 1 - \frac{6 \left( \sum D^2 + \sum \frac{m(m^2-1)}{12} \right)}{n(n^2 - 1)}$

Example 5. Calculate the rank correlation coefficient between X and Y from the following data: X: 1200, 1150, 1000, 990, 800, 780, 760, 750, 730, 700, 620, 600; Y: 75, 65, 50, 100, 90, 85, 90, 40, 50, 60, 50, 75.

Answer:

Assign ranks. Tied values get the average rank. For X, no ties. For Y, 50 appears 3 times (ranks 9, 10, 11, avg = 10), 75 appears 2 times (ranks 5, 6, avg = 5.5), 90 appears 2 times (ranks 2, 3, avg = 2.5).

X Y Rank X (RX) Rank Y (RY) D = RX – RY D2
1200 75 1 5.5 –4.5 20.25
1150 65 2 7 –5 25.00
1000 50 3 10 –7 49.00
990 100 4 1 3 9.00
800 90 5 2.5 2.5 6.25
780 85 6 4 2 4.00
760 90 7 2.5 4.5 20.25
750 40 8 12 –4 16.00
730 50 9 10 –1 1.00
700 60 10 8 2 4.00
620 50 11 10 1 1.00
600 75 12 5.5 6.5 42.25
$\sum D^2=198.00$

$n=12$. Ties in Y: Value 50 (m=3), Value 75 (m=2), Value 90 (m=2).

Correction Factor = $\frac{3(3^2-1)}{12} + \frac{2(2^2-1)}{12} + \frac{2(2^2-1)}{12} = \frac{3(8)}{12} + \frac{2(3)}{12} + \frac{2(3)}{12} = \frac{24}{12} + \frac{6}{12} + \frac{6}{12} = 2 + 0.5 + 0.5 = 3$

$r_s = 1 - \frac{6 (\sum D^2 + \text{Correction Factor})}{n(n^2 - 1)} = 1 - \frac{6 (198 + 3)}{12(12^2 - 1)} = 1 - \frac{6 \times 201}{12(144 - 1)} = 1 - \frac{1206}{12 \times 143} = 1 - \frac{1206}{1716} \approx 1 - 0.7039 \approx 0.296$

There is a positive rank correlation of approximately 0.296.




Conclusion

Correlation analysis provides techniques to study the relationship between two variables, particularly linear relationships. Scatter diagrams offer a visual representation. Karl Pearson's coefficient ($r$) and Spearman's rank correlation coefficient ($r_s$) provide numerical measures of linear association. Pearson's $r$ is used for precisely measured quantitative data, while Spearman's $r_s$ is suitable for ranked data, subjective attributes, or data with extreme values. It is important to remember that correlation indicates covariation, not causation. The knowledge of correlation helps understand the direction and intensity of how variables change together, but does not explain why they are related.


Recap: